Skip to content

Add Numeric Feature#4

Merged
simba-git merged 16 commits intodevelopfrom
feature/simba/numeric_feature
May 7, 2020
Merged

Add Numeric Feature#4
simba-git merged 16 commits intodevelopfrom
feature/simba/numeric_feature

Conversation

@simba-git
Copy link

Numeric features allow a user to take their source tables and easily apply a variety of feature engineering techniques to it. It effectively separates general data engineering tasks (ex. join actions and users tables) from data science-oriented ones (ex. fill missing values with the mean value, truncate outliers, and min-max scale the column)

Simba added 16 commits May 6, 2020 10:00
The column parameter is useful for normalization, binning, and other
operations. Adding to the simple operations to keep the APIs consistent
Operations will be used as default parameters in features. Make it
an import allows them to be used without changing file ordering.
NoOp operation is cleaner than checking for None. It's also a nice
utility to match the interface with an operation that does nothing.
In the local tests, the feature name was inconsistent with the source
column name.
MinMax will be used as a scaling operation for Feature.
Forgot to do it before last commit.
This makes the column have a different mean and median value to help
testing.
The Sqrt object shouldn't be used.
These operations will be used to fill missing values.
These two operations provide a basis for outlier handling.
Truncate is slightly clearer and more consistent in the numeric feature
definition
Normalizing, truncating, etc. are all operations. This can lead to
confusion where-as transform is clearer.
This allows a variety of typical feature engineering techniques to be
appied using StreamSQL.
@simba-git simba-git added this to the v0.0.0a1 milestone May 7, 2020
@simba-git simba-git changed the title Adds Numeric Feature Add Numeric Feature May 7, 2020
@simba-git simba-git merged commit cb51a64 into develop May 7, 2020
@simba-git simba-git deleted the feature/simba/numeric_feature branch May 7, 2020 05:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant